Percolation on interacting networks 
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Most networks of interest do not live in isolation. Instead they form components of larger systems 
in which multiple networks with distinct topologies coexist and where elements distributed amongst 
different networks may interact directly. Here we develop a mathematical framework based on 
generating functions for analyzing a system of / interacting networks given the connectivity within 
and between networks. We derive exact expressions for the percolation threshold describing the onset 
of large-scale connectivity in the system of networks and each network individually. These general 
expressions apply to networks with arbitrary degree distributions and we explicitly evaluate them for 
I = 2 interacting networks with a few choices of degree distributions. We show that the percolation 
threshold in an individual network can be significantly lowered once "hidden" connections to other 
networks are considered. We show applications of the framework to two real- world systems involving 
communications networks and socio-tecnical congruence in software systems. 

PACS numbers: 64.60.aq, 89.75.Fb 



In the past decade there has been a significant advance 
in understanding the structure and function of networks. 
Mathematical models of networks are now widely used to 
describe a broad range of complex systems, from spread 
of disease on networks of human contacts to interactions 
amongst proteins [TJ [2J [3]. However, current methods 
deal almost exclusively with individual networks treated 
as isolated systems. In reality an individual network is 
often just one component in a much larger complex sys- 
tem; a system that can bring together multiple networks 
with distinct topologies and functions. For instance, a 
pathogen spreads on a network of human contacts abet- 
ted by global and regional transportation networks. Like- 
wise, email and e-commerce networks rely on the Internet 
which in turn relies on the electric grid. In biological sys- 
tems, activated genes give rise to proteins some of which 
go back to the genetic level and activate or inhibit other 
genes. Results obtained in the context of a single isolated 
network can change dramatically once interactions with 
other networks are incorporated. 

Consider a system formed by two interacting networks, 
a and f3, Fig 1(a) Network a could be a human contact 



network for one geographic region and network (3 that for 
a separated region. When viewed as individual systems, 
only small clusters of connected nodes exist, hence, a dis- 
ease spreading in cither network should stay contained 
within clusters. In reality, a disease can hop from a to (3, 
for instance, by an infected person flying on a airplane, 
spread in the (3 network and eventually hop back to the 
a network into new clusters, causing an epidemic out- 
break. Next consider interacting networks that contain 
completely different types of nodes. Network a can be 
a social network, such as an email communication net- 
work of software developers, while network (3 can be a 
technological network, such as the network of calls be- 
tween functions in software code. Here, bi-partite edges 
connect developers on a to code they author on (3. 
An important step towards modeling interacting net- 



works was introduced with the layered network frame- 
work of @]. Yet, the networks in the distinct layers must 
be composed of the identical nodes (modeling essentially 
physical connectivity and logical connectivity or flow). 
Herein we consider systems of I > 2 distinct interact- 
ing networks and calculate explicitly how the connectiv- 
ity within and between networks determines the onset of 
large scale connectivity in the system and in each net- 
work individually. Our mathematical formulation has 
some overlap with recent works calculating connectivity 
properties in a single network accounting for a diversity 
of node attributes |5J [5] or interactions between modules 
within a network [TJ |S] . Here we present our formalism 
and also applications to real-world systems of interacting 
networks coming from telecommunications and software. 

The onset of large-scale connectivity (i.e., the percola- 
tion threshold) corresponding to the emergence of a giant 
connected component in an isolated network has been 
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FIG. 1: a) Two networks a and (5. Nodes interact directly 
with other nodes in their immediate network, yet also with 
nodes in the second network, b) An illustration of the re- 
maining edges incident to a node in a network fi reached by 
following a random edge between networks v and fi. 



2 



studied extensively, first for random networks with Pois- 
son degree distributions [9] and later for random networks 
with arbitrary degree distributions [10]. Similar results 
were then derived using generating functions 112] , the 
approach we employ herein. Generating functions, simi- 
lar to the network configuration model [TOj [13] , evaluate 
the ensemble of all possible random networks consistent 
with a specified degree distribution, {pk}, and are most 
accurate in the sparse regime where networks are approx- 
imately tree-like. Thus in the regime before the emer- 
gence of the giant component, generating functions can 
be used to calculate the distribution of component sizes. 
In the supercritical regime they can be used to calculate 
the distribution in sizes of components that are not part 
of the giant component. 

For our purposes, a system with / > 2 interacting net- 
works is described by a set of degree distributions. Each 
individual network /i is characterized by a multi-degree 

distribution, {Pk 1 k 2 ---ki K wnere Pk 1 k 2 ---k l ' s ^ ne fraction 
of all nodes in network n that have k\ edges to nodes 
in network 1, k^ edges to nodes in network 2, etc. The 
multi-degree distribution for each network may be writ- 
ten in the form of a generating function: 
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To simplify notation in what follows, we now define two 
Z-tuple's, x = (xi, . . . ,x/) and 1 = (1, . . . , 1). 

Our interest is in calculating the distribution of compo- 
nent sizes, where a component is a set of nodes connected 
to one another either directly or indirectly by travers- 
ing a path along edges. Clearly such components can 
be composed of nodes distributed among the I different 
networks, and our formulation allows us to calculate the 
distribution of such system-wide components, yet also to 
refine the focus and calculate the contribution coming 
from nodes contained in only one of the I networks. 

We begin by deriving the distribution of connectiv- 
ity for a node at the end of a randomly chosen edge. 
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FIG. 2: Comparison over time of the distribution of the num- 
ber of developers connected indirectly via co-editing code in 
the Apache project with the distribution expected in a ran- 
dom network with the same multi-degree distribution. Verti- 
cal lines mark the first generally available release in 2002, and 
a significant deviation from random in 2003, when the com- 
munication network shrinks and the project seems to become 
more efficiently organized. 



Consider selecting uniformly at random an edge falling 
between a node in network v and a node in network 
fi (i.e., a v-y, edge). The \i node attached to the edge 
is k u times more likely to have ^-degree k v than de- 
gree 1. We can also account for the remaining local 
connectivity, to nodes in other networks as shown in 
Fig. 1(b) In single isolated networks remaining con- 



nectivity is called the excess degree of a node |llj . Let 
1k U —k—k denote the probability of following a randomly 
chosen v-fi edge to a node with excess v degree as shown 
in Fig. |l(b) (which has total ^-degree of k v + 1). Then 

C-Av-fc, K ( k » + 1 )Pfci.«(fc„+i)-fc,' and thc generating 
function for the distribution, } is, 
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where G^(x) denotes the first derivative of G M (x) with 
respect to x u and the denominator is a normalization 
constant so that G^^l) = 1. Also note that G"(l) = k^ 
is the average z/-degree for a node in network /i. 

The distribution of second nearest neighbors for 
that fi node via the v layer is calculated by us- 
ing Eq. [2] as the argument to Eq. [T] namely 
G M (l ! l,...,G !/ ^(x)| :Cx=1 ^ M ,...,l). Comparing this dis- 
tribution calculated via generating functions to that 
found in real-world interacting networks can reveal in- 
teresting statistical features. Returning to the software 
example, we have a network of email communication be- 
tween developers, a network of relations between code, 
and bipartite edges connecting developers to the code 
they edit. We would expect that the real system does not 
resemble a random network, but instead reflects a struc- 
ture conducive to project development. For instance, if 
two developers edit the same code we would like for them 
to directly communicate via email and thus be first neigh- 
bors. In a sparse random network these developers would 
typically be second neighbors, connected indirectly via 
the code they both edit. 

We analyze the evolution of the Apache 2.0 Open 
Source Software project from mid-2000 thru 2004, with 
data aggregated over three month windows. From this we 
extracted the multi-degree distribution of the system for 
each time-shot, which we then plug into our generating 
functions to calculate the expected distribution of second 
neighbors found by following first a developer-to-code 
edge then a code-to-developer edge. We then compare 
this distribution to the real distribution of such devel- 
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FIG. 3: A diagramatical representation of the topological con- 
straints placed on the generating function (x) for the dis- 
tribution of sizes of components reachable by following a ran- 
domly chosen u-fj, edge. The labels attached to each edge 
indicate type or flavor of the edge and summation notation 
indicates that we are summing over all possible flavors. 



oper second nearest neighbors using the Jensen-Shannon 
divergence [H], a symmetric measure based on Kullback- 
Leibler divergence. The results are shown in Fig. [2] with 
the JS-score of the real networks normalized by the JS- 
scores from the ensemble of random networks. Values 
greater or less than unity indicate networks more or less 
random than average. We indicate two vertical bars 
where significant difference between the random and real 
networks occurs. The first, in mid-2002 marks the first 
general availability release of Apache 2.0, the second, at 
the start of 2003, is a bug and security fix [T5]. This lat- 
ter point, moreover, marks when a substantial purging 
of developers from the communication network occurs. 
In any three-month window we observe that only about 
25 developers edit code, yet prior to 2003 the number 
of developers in the email network is significantly larger. 
Thus this time seems to indicate when the Apache project 
becomes more efficiently organized, eliminating noise of 
spurious emails to inactive developers. 

We are now in position to consider component sizes. 
Assume we follow a randomly chosen v-\l edge to a /x node 
(Fig. 1(b) I, and consider the distribution in sizes of the 
component found by following the additional outgoing 
edges. Let 7J M „(x) denote the associated generating func- 
tion. Fig. [3] illustrates all the types of connectivity possi- 
ble for the /x-node, and summing over all these possibili- 
ties leads to the self-consistency equation for H^ v (-x): 
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&ij denotes the Kronecker delta, used here to account for 
all combinations of flavors of edges connected to the /x- 
node leading to specified excess degree i. Reordering the 
terms, Eq. [3] becomes 



H IM/ (x)=x„ J2 €-k l H M kl --- H M kl - ( 4 ) 

fci...fei=0 



We recognize the form of this equation from Eq. [2j thus 

ff^(x) = x^G^H^x), fl, M (x)]. (5) 

We now consider starting from a randomly chosen /x- 
node, rather than a random v-\i edge. A topology such as 
one from Fig. [3] exists a the end of each edge incident to 
the /x-node. The generating function for the probability 
distribution of component sizes is, 



flp(x) = avG M [fzV(x), . . . , H Jm (x)]. 



(6) 



While in theory it is possible to solve Eq. [5] for H IM1/ (x) 
and use that solution in Eq. [6] to solve for if M (x), in 
practice, even for the case of a single isolated network, as 
noted in |llj the equations are typically quite difficult to 
solve. Yet, Eq. fallows calculation of average component 
size. A component may include multiple node flavors, but 
we can distinguish between the average number of each 
type. For example, the average number of ^-nodes in the 
component of a randomly chosen /i-node is 
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Intuitively Eq.^is reasonable because H^ x (l) represents 
the average number of ^-nodes in the component found 
by following a /x-A edge towards a A-node, and the ex- 
pected number of /x-A edges incident to an initial /x-node 



is G^ (1) (recall, G^ A (1) = k^). The product of the two 
terms summed over all A networks produces the num- 
ber of ^-nodes in a component connected to a randomly 
chosen /x-node, (s^) v . 

The preceding results regarding components hold in 
the sub-critical regime where no giant connected compo- 
nent exists. Once a giant component emerges, generating 
functions allow us to calculate properties of components 
not belonging to it. The giant component will span mul- 
tiple networks and calculating its size requires accounting 
for the contribution from each network. Let be the 
fraction of /x-nodes belonging to the giant component. 
The probability that a randomly chosen /x-node is not 
part of the giant component must then satisfy the fol- 
lowing equation, 



i-^=E<, 



G M (wi AJ , 



,uin), (8) 



fei ,...,fc,=0 



where u vfl is the probability that an \x-v edge is not part 
of the giant component. In addition, for all /x, v € /, u vll 
must satisfy, 



G vll {u\ Vl . . . , uiv), 



(9) 
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derived using the same self-consistency arguments that 
resulted in Eq. [5] 

Though all the equations above hold for a system of 
I > 2 interacting networks, we now give a concrete ex- 
ample for 1 = 2, with the networks indexed as a and (3. 
Consider first the simplest of systems, where the inter- 
nal connectivity of a and (3 each has a distinct Poisson 
degree distribution, and the inter-network connectivity 
is described by a third Poisson degree distribution, for 



instance, 



>/k a 



(Recall fc^ denotes the average ^-degree for a node in 
network /x.) Then, from Eq. [T| 



G a (x a , Xp) 
Gp(x a ,xp) 



= e fc"(*«-l) e ^«(a/3-l) 
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Using Eq. [7J the average number of a-nodes in a compo- 
nent reachable from a randomly chosen a-node is, 



(s a )c 
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The average component 



( 1 — fc Q ) ( 1 — kg ) = k k a k f3 ; the point at 



size diverges for 

rflj-a 

\,p ) — ^a^p, 

which the giant component emerges. (Ref. [5] recently 
presented an alternate method for deriving similar per- 
colation thresholds and connectivity properties, but in a 
single network with multiple interacting communities.) 
Note, following Eq. [7] we can show (sp) a , (s a )p, and 
(sp)p also all diverge at this point, marking when a giant 
component emerges in each network and throughout 
the system. Further simplifying, by assuming the two 
interacting networks have the same degree distribution, 



k a — k 3 = fcintra and k a = k g = k intcr , then the giant 
component emerges when, fester + ^iatra — 1; recovering 
the standard result for a single network (which, by 
definition, has fcj ntcr = 0) that emergence occurs for 

^-intra 1 • 

Once the giant component emerges the u v>1 which sat- 
isfy Eq. [9] are u aa = u a g = 1 — S a and upp = up a = 
1 — Sp, while S a and Sp, respectively, the number of a- 
nodes and /3-nodes in the giant component of the system, 
satisfy 



S a = l- e -( k " s o+KS p ) 



(13) 
(14) 



To observe the change in connectivity of one network 
precipitated by an increase in connectivity of a second 
network attached to the first, we simulated a system of 

two interacting networks and fixed and Tp while 



varying kp from to 5 (Fig. 



4) 



.As 



increases the 



/3-nctwork becomes a single connected component (the 
traditional behavior for a single network) and Sp — > 1. 
However, the connectivity of a remains limited. It can be 




FIG. 4: Numerical simulations of connectivity in a system 
of two interacting Poisson degree distributed networks, a and 
j3, with inter-network connectivity also Poisson distributed, as 
connectivity on increases. Each network has 100,00 nodes, 
with = 0.4 and fcf = k'p = 0.5. Shown are the fraction of 
a nodes, S a (circles), /3 nodes, Sp (squares), and all nodes, 
S (triangles) in the system-wide giant component, with the 
dashed curves giving the analytic results, Eqns. (13) and (14). 
The horizontal dashed line is the asymptotic value to which 
S a approaches. (Inset) Analogous results when a has Poisson 
distribution with fc^ = 0.5, inter- network edges follow a Pois- 

— /3 —a 

son distribution with k a — kp = 0.4, but p has a power-law 
distribution with exponent r = 2.5 and an exponential cutoff 
that we vary between 1 < k < 300. The solid curve is the 
result for network /3 when viewed in isolation. 



shown that as kg increases S a 



-W 



(dashed horizontal line in Fig. |4j), where W is the Lam- 
bert W function, also known as the product log. 

We next consider more complex degree distribu- 
tions, where a is still described by a Poisson dis- 
tribution, but the internal connectivity of (3 is de- 
scribed by a power-law distribution with an exponen- 
tial cutoff. While power-law degree distributions have 
attracted considerable attention as a model for node 
degree distributions in many types of networks |16j . 
a power-law with an exponential cutoff may be a 
better model for real- world networks [T7]- Here 

Pl^ = [(k a p) k «e^/k a \\ [(kpYe-^^/UAe- 1 ^)] 
where Li„(a;) is the nth polylogarithm of x and serves 
as a normalizing factor for the distribution. Thus, we 
can write our basic generating function for network /3, 

G^,) = ^- ^^ . (15) 
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The generating function for a is still given by Eq 
We simulate the impact on the connectivity of the a- 
network as the exponential cutoff and hence the average 
degree of network (3 increases, inset of Fig. [4] Again 
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FIG. 5: Inset are two sample networks of Bluetooth connec- 
tivity. The main figure shows the increase in participation 
in the giant component as connectivity between a and /3 in- 



creases, starting from k 







0.1. Points are obtained by 



taking the empirical data and simulating inter-network edges 
with the appropriate k a and kg, averaged over 100 realiza- 
tions. The solid line is from analytic calculations. 



the dashed curves are the analytic results obtained by 
solving Eqns. [8] and [9] The solid red line is the behavior 
for the [3 network considered in isolation, showing that 
even the percolation threshold for is lowered through 
connectivity with network a. 

Finally we consider an application of connectivity to 
communications networks, building on the increasing in- 
terest in using Bluetooth connectivity between individ- 
uals to transmit data [IB]. For instance, rather than 
downloading a webpage (such as the CNN homepage) 
by connecting to the Internet, a copy could be obtained 
from a close-by individual already in possession of this 
data. We construct prototypical networks of local Blue- 



tooth connectivity between individuals from raw data of 
Bluetooth sightings by 41 attendees at the 25th IEEEE 
International Conference on Computer Communications 
(INFOCOM) PJ]. We initially partition the raw data 
into discrete 20 minute windows and consider that a 
communication edge exists between any two devices so 
long as they are within contact for at least 120 seconds. 
Each network has approximately a Poisson degree dis- 
tribution of connectivity. We choose two arbitrary 20 
minute snapshots as proxies for two distinct networks, 
a and f3, representing, for instance, two separate rooms 
at the conference. We calculate how adding long-range 
connections between a and f3 (for instance via text mes- 
sages or email) enhances overall connectivity in the sys- 
tem. In other words, we calculate how many long-range 
connections would be needed between two isolated local 
Bluetooth networks to create the desired large scale con- 
nectivity, potentially allowing many users to share infor- 
mation. Figure [5] shows the size of the giant component 
obtained via numerical simulations using the real data 
(points) and the analytic calculations obtained via gen- 
erating functions (dashed line) . The analytic calculations 
slightly overestimate connectivity, yet there is remarkable 
agreement with empirical data even though the actually 
networks are quite small. 

In summary, we have introduced a formalism for cal- 
culating connectivity properties in a system of / interact- 
ing networks. We demonstrate the extreme lowering of 
the percolation threshold possible once interactions with 
other networks are taken into account. This framework 
for calculating connectivity and statistics of interacting 
networks should be broadly applicable, and we show po- 
tential applications to software and communications sys- 
tems. 
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